The Sensitivity of Latent Dirichlet Allocation for Information Retrieval
نویسندگان
چکیده
It has been shown that the use of topic models for Information retrieval provides an increase in precision when used in the appropriate form. Latent Dirichlet Allocation (LDA) is a generative topic model that allows us to model documents using a Dirichlet prior. Using this topic model, we are able to obtain a fitted Dirichlet parameter that provides the maximum likelihood for the document set. In this article, we examine the sensitivity of LDA with respect to the Dirichlet parameter when used for Information retrieval. We compare the topic model computation times, storage requirements and retrieval precision of fitted LDA to LDA with a uniform Dirichlet prior. The results show there there is no significant benefit of using fitted LDA over the LDA with a constant Dirichlet parameter, hence showing that LDA is insensitive with respect to the Dirichlet parameter when used for Information retrieval.
منابع مشابه
Supervised acoustic topic model for unstructured audio information retrieval
We introduce a modified version of the acoustic topic model, which assumes an audio signal consists of latent acoustic topics and each topic can be interpreted as a distribution over acoustic words, for unstructured audio information retrieval applications. The proposed supervised acoustic topic model is based on supervised latent Dirichlet allocation (sLDA) while the conventional acoustic topi...
متن کاملMulti - label Classification Algorithm Based on Latent Dirichlet Allocation Model
Vector Space Model (VSM) is used frequently in Text Classification (TC). However, it is usually produces a high dimensional feature space which leads to huge cost of computation and storage. Recently, statistic topic model plays an important role in the field of Information Retrieval (IR), TC and Document Clustering. In this chapter, we try to use a kind of statistic model—Latent Dirichlet Allo...
متن کاملComparison of Topic Language Models for Query Disambiguation in Information Retrieval
A long-standing challenge in information retrieval is to disambiguate query words for more precise search results. However, two or more meanings of a word in a query, or polysemy, deteriorate the precision effectiveness of information retrieval systems. There is a need for correct and effective information retrieval in many information systems such as health care and customer relationship manag...
متن کاملAre Semantically Coherent Topic Models Useful for Ad Hoc Information Retrieval?
The current topic modeling approaches for Information Retrieval do not allow to explicitly model query-oriented latent topics. More, the semantic coherence of the topics has never been considered in this field. We propose a model-based feedback approach that learns Latent Dirichlet Allocation topic models on the top-ranked pseudo-relevant feedback, and we measure the semantic coherence of those...
متن کاملDocument Clustering and Visualization with Latent Dirichlet Allocation and Self-Organizing Maps
Clustering and visualization of large text document collections aids in browsing, navigation, and information retrieval. We present a document clustering and visualization method based on Latent Dirichlet Allocation and self-organizing maps (LDA-SOM). LDA-SOM clusters documents based on topical content and renders clusters in an intuitive twodimensional format. Document topics are inferred usin...
متن کامل